As is well known, diabetes is a chronic (long-lasting) health condition that affects how your body turns food into energy. It causes more deaths per year than breast cancer and AIDS combined, and people with diabetes are at increased risk of serious health complications including premature death, vision loss, heart disease, stroke, kidney failure, and amputation of toes, feet, or legs.
Although it is truly can be treated as a severe disease and worth people’s attention. However, according to National Diabetes Statistics Report, about 7.2 million (23.8%) adults in the US were not aware of or did not report having diabetes.Thus, we want to do some research and analysis about this disease to see what are the main reasons for causing diabetes and if what can we do for controling and preventing from it.
After considering potential influencing factors of diabetes, we are both responsible for collecting data in terms of different factors. Data are all from the original sources and mostly from “Centers for Disease Control and Prevention” and “Bureau of Economic Analysis” from U.S. Department of Commerce. In the form of getting the data, we either directly downloaded excels (or csv) or scraped data from the web pages.
Factors causing diabetes are grouped by several categories including physical indicators, age, education, diet habits, alcohol, and per capita GDP. Records are based on 53 states of US. In choosing data types, we do have total number and percentage of each factor (variable) for each state; however, every state has different amount of population, we will consider percentages of the total number of each factor among total population as our metrice.
For physical indicators, High Cholesterol, Hypertension, Inactivity, Obesity, Poor Health and Smoking are counted.
Age will be classified in 4 categories which are 18-44, 45-64, 65-74 and 75+. In each classification, we have percentages of diagnosed diabetes of each age range for 53 states.
Since education will enhance one’s self-management ability and enrich scientific knowledges so that we wonder if education levels would be a factor and how much it will be related to diabetes. Education levels are specified by three types including under high school, high school and post high school. Each state will have a percentage of diabete rates corresponding to each education levels.
In order to consider diet habits that affacting diabetes, we took sweet beverage consumption among adults and the number of McDonald’s location among states as metrices. Unfortunately, we are only able to obtain data of the first kind (sweet beverage consumption) from 24 states.
Alcohol is a kind of popular lifestyle in US which let us to think about whether excessive alcohol consumption is one of major factor causing diabetes in the long run.
Moreover, Per capita GDP is also important indicator for representing economic condition of each state, which might potentially affacts the chance of being diabetes.
For the parts of data that directly downloading excel or csv from orginial websites, we used read.csv() to import into R for further data processing. Those data are including High Cholesterol, Hypertension, Inactivity, Obesity, Poor Health, Smoking, age, education, per capita GDP and number of McDonald’s location.
On the other part of data that does not have nice downloadable excel or csv, we used packages tidyverse and rvest to scrap tables from official webpages and saved those tables into our .data/raw in .csv format. Those data include alcohol and sweet beverage consumption.
We mainly used pipeline operator methods like select, filter, to clean up the dataset and just leave the necessary part and also rename the columns with meaningful names. For some plots, we also need to narrow the dataset using gather or categorize numerical data into several categorical groups. All numeric related data are cleaned from character or factor type to double or integer form for being ready to later analysis.
Do we have missing values for datasets that involving all the variables?
Yes, we do. Here, we visualized missing patterns using the visna function in the extracat package. From above plot, we realized that there are 8 missing patterns by each row, and most columns have some kinds of missing values. By sorting both rows and columns, sweet beverage consumption related variables (NoSweetBev, less1perDay, more1perDay) have the most missing values, and we observed the top two most common missing patterns.
Dealing with missing values and Save as RData for convenience
For those States with low population like Virginia Island, Hawaii, we just simply eliminated those data from the dataframe for reducing the effects of potential outliers. However, for data as Sweet Beverage, we were only able to find the data for 24 states, which caused a lot of missing values compared with other datasets have 50+ states. Hence, the only way of dealing with this kind of situation was to separate it from the whole big dataset in case that it affects other analysis.
Diagnosed Number Against Every Risk Factor
Six potential risk factors above approximately show existing positively correlation with diagnosed Diabetes. Even though proportion of Physical Inactivity and Smoking is under 50% in all diagnosed diabetes, the overall trend shows that they do increase as proportion of diabetes increases.
The rest factors including Hypertension, High Cholesterol, Obesity and Poor General Health all showing a growing trend, which tells us that as proportion of diagnosed diabetes increases, the proportion of those risk factors increase as well. Therefore, all these risk factors do have influences on causing diabetes to some degree in the long run.
Now look at these factors using other graph pattern to get more information.
DotPlot
We drawed Dot Plot for exploring other pattern or information about six risk factors. Here, the percentage distribution of each factor displayed clearly. We observe that among the proportion of diabetes population, obesity exists in a high percentage above 75% and smoking exists in a very low percentage below 40%, which means among all people who have diabetes, over 75% of them are overweight or obesity and under 40% of them have smoking habit. Population with diabetes are very likely to have obesity and are relatively less likely to smoke.
From this plot, we can also figure out that Obesity, Hypertension and High Cholesterol are the top 3 risk factors which diabetes contingent on. Clearly, all of them are closely related to people’s lifestyle and eating habits. An assumption that high-calorie food lovers are more easily to get diabetes than those who prefer healthy and balanced diet.
Looking into sugary and oily food(such as junk food) consumption data in each states may helps giving the answer for this assumption.
Fast, Fried/Oily food (eg. Mcdonald’s) Analysis
Considering that diet habits specially fast foods could be an important factor for diabetes, we are able to draw a cleveland dot pot to make comparion of number of McDonald’s location among all states. The states were ordered by percentage of diagnosed diabetes for easy reading.
Even though there is no obvious clear pattern to show the relation between diabetes and number of McDonald’s location, we are still able to observe a slightly general trend states with low diabetes proportion are also has less number of McDonald’s per million people. Therefore we decide to draw another plot to see if this is the case.
We divided states into five ranges in terms of ranking in diabetes proportion, and at the same time, the number of McDonalds are classified in 3 categories (less than 44 McDonalds per 1 million people, 45-54 McDonalds per 1 million people, greater than 55 McDonalds per 1 million people). The top 20 diabetes states has much larger proportion on 55+ McDonalds comparing with the bottom 20 diabetes states.
Summary: Food with high calories can contribute to increasing the risk of getting diabetes. Now we are interested in “Does drinking sweet beverages increases the possibility of diabetes?”
Sweet Beverage Analysis
The above cleveland dot plot giving the answer is Yes. When the states have larger proportion of diabetes, the lower percentage people do not drink sweet beverage at all and higher percentage people drink more than 1 per day. In other words, percentage of more than 1 per day increases as the state’s number of diabetes increases; percentages of less than 1 per day and no sweet beverage decreases as the state’s number of diabetes increases. Summary: We believe that sweet beverage consumption does cause diabetes.
Let’s see another plot to ensure this result.
Six potential risk factors above approximately show existing positively correlation with diagnosed Diabetes. Even though proportion of Physical Inactivity and Smoking is under 50% in all diagnosed diabetes, the overall trend shows that they do increase as proportion of diabetes increases.
The rest factors including Hypertension, High Cholesterol, Obesity and Poor General Health all showing a growing trend, which tells us that as proportion of diagnosed diabetes increases, the proportion of those risk factors increase as well. Therefore, all these risk factors do have influences on causing diabetes to some degree in the long run.
Now look at these factors using other graph pattern to get more information.
DotPlot
We drawed Dot Plot for exploring other pattern or information about six risk factors. Here, the percentage distribution of each factor displayed clearly. We observe that among the proportion of diabetes population, obesity exists in a high percentage above 75% and smoking exists in a very low percentage below 40%, which means among all people who have diabetes, over 75% of them are overweight or obesity and under 40% of them have smoking habit. Population with diabetes are very likely to have obesity and are relatively less likely to smoke.
From this plot, we can also figure out that Obesity, Hypertension and High Cholesterol are the top 3 risk factors which diabetes contingent on. Clearly, all of them are closely related to people’s lifestyle and eating habits. An assumption that high-calorie food lovers are more easily to get diabetes than those who prefer healthy and balanced diet.
Looking into sugary and oily food(such as junk food) consumption data in each states may helps giving the answer for this assumption.
Fast, Fried/Oily food (eg. Mcdonald’s) Analysis
Considering that diet habits specially fast foods could be an important factor for diabetes, we are able to draw a cleveland dot pot to make comparion of number of McDonald’s location among all states. The states were ordered by percentage of diagnosed diabetes for easy reading.
Even though there is no obvious clear pattern to show the relation between diabetes and number of McDonald’s location, we are still able to observe a slightly general trend states with low diabetes proportion are also has less number of McDonald’s per million people. Therefore we decide to draw another plot to see if this is the case.
We divided states into five ranges in terms of ranking in diabetes proportion, and at the same time, the number of McDonalds are classified in 3 categories (less than 44 McDonalds per 1 million people, 45-54 McDonalds per 1 million people, greater than 55 McDonalds per 1 million people). The top 20 diabetes states has much larger proportion on 55+ McDonalds comparing with the bottom 20 diabetes states.
Summary: Food with high calories can contribute to increasing the risk of getting diabetes. Now we are interested in “Does drinking sweet beverages increases the possibility of diabetes?”
Sweet Beverage Analysis
The above cleveland dot plot giving the answer is Yes. When the states have larger proportion of diabetes, the lower percentage people do not drink sweet beverage at all and higher percentage people drink more than 1 per day. In other words, percentage of more than 1 per day increases as the state’s number of diabetes increases; percentages of less than 1 per day and no sweet beverage decreases as the state’s number of diabetes increases.
Summary: We believe that sweet beverage consumption does cause diabetes.
Let’s see another plot to ensure this result.
This line plot also provided the evidence that indicating sweet drinks contribute to development of diabetes. The more diabetes states have (purple line), the larger percentage of more1perDay (green line) they have, and the less percentage of less1perDay (red line) and NoSweetBev (blue line) they have. Conclusion: Drinking sweet beverages seems to be an important factor to become diabetes.
Drinking alcohol is also a bad diet habit which can cause lots of healthy problems, such as obesity. Thus, we got the data of excessive alcohol drinking among all states to see if it has any effects on diabetes.
Excessive Alcohol Analysis
Originally, we presumed that alcohol will definitely increase the chance of being diabetes. Here we use horizontal bar plot to explore the relation between diabetes and excessive alcohol assumption. We ranked the states in the order from highest diabetes proportion in the top to lowest diabetes proportion in the bottom. Surprisingly, it appears no clear pattern or even a trend which approximately shows that number of excessive drinks per capita increases as number of diagnosed diabetes decreases that opposing against our original belief. However, after reading the papers and other materils about diabetes, it was found that excessive drinking may indeed causes a decrease in blood sugar, but this does not mean that excessive drinking can help relieve diabetes, even though the amount and types of alcohol might relate to diabetes in some way which requires for further researching.
There are also other common and inevitable risk factors like age and gender. According to the research and background information we have, gender is not quite influential in diabetes, but age does. Hence, we analyzed on Age factor.
AGE Analysis
In this parallel coordinate plot of age, each color line represents a specific age range, and each age range that having diabetes takes percentage from total diabetes in that state. It clearly shows that the percentage of being diabetes increases as age increases. Population in age 18-44 with diabetes have very low percentage (below 5%) on total diabetes and less fluctuation in each state. Over 45, proportion of diabetes becomes very fluctuant that really depending on states. Most states’ diabetes population with 45-64 age are within 10% - 20%. The highest percentage of diabetes are located in age over 65, particularly between 65-74. This makes sense since the metabolic function of the pancreas becomes degenerated as age increases, which triggers the higher possibility to have diabetes.
Other Interesting Findings
Education
Let’s see how education levels affect diabetes! In the cleveland dot pot above, we ranked states from highest diabetes proportion to lowest, three types of education levels in different colors are almost clearly distinguished from each other. Among the diabetes population, the higher level of the education was gained, the lower percentage of diabetes the group would have. Represented by percentage numbers in the plot, we understand diabetes population with post high school education are only below 12%, diabetes population with high school education range from 6.8% to 16% with fluctuation, and those population with under high school education have most fluctuant percentages among states from 9.7% to 24.1%. Generally speaking, people with higher education level would pursue and possess a healthy lifestyle and great diet habits, which will prevent them from being diabetes to some degree.
GDP Analysis
Per Capita GDP is an important indicator of economic condition. With higher per capita GDP, it seems that people are more capable of improving life standard and preventing themselves from being diabetes. This dot pot displays three kinds of colors, blue dots representing top 10 states with largest number of diabetes, green dots representing middle states and red dots showing the bottom 10 states with smallest number of diabetes. It seems clear that the top 10 states has very low per capita GDP below 50000. The rest of states have relatively higher per capita GDP and fluctuant ranges.
An overall correlation between every two factors from per capita GDP, sweet beverages and fast food (McDonald) can be shown from above pairs correlation plot. The higher the per capita GDP is, the less number of McDonald’s in that state. Moreover, the higher the per capita GDP is, the less likely people drink sweet beverages. In a sense, with higher per capita GDP condition, people tends to healthier eating habits (less fastfood and sweet beverages), which reduces the chance of diagnosed diabetes.
Shiny App:
https://xc2464.shinyapps.io/shiny/
For more graphic patterns such as heatmap, they are shown in shiny app.
GitHub Repo
https://github.com/XinyiChenJ/Graphics_Final.git
Shiny App:
https://xc2464.shinyapps.io/shiny/
For more graphic patterns such as heatmap, they are shown in shiny app.
>>>>>>> 1dcbc0aa4e7c43338cda49f94ff0a6820bf65684Limitations Factor triggering diabetes is a very interesting research and has pratical meaning. It definitely deserves for further researching. In our project, we found many factors that positively related to diabetes, such as obesity, diet habits, etc. Also, we surprisingly observed that excessive alcohol is negatively related to diabetes and it really depdends on types and the amount of the alcohol. However, limited data stopped us from finding a clearer pattern and results. We are only able to get data for 2015 and some of datasets only include 24 states. In this situation, we could not prevent errors from the limitation of data.
Future Direction For future direction, eating habits seem to be very important so that we can do more research in this area, which can give people a lot of advice to avoid diabetes. For example, the GI value of food, whether eating red meat and white meat will have an effect, whether sweet beverages should be replaced by coffee or tea, and so on. We will also do more relative research and find more comprehensive data. Right now, we only explored physical factors that causing diabetes, but mental health might have an impact as well. Furthermore, family inheritance: which body type or disease may be inherited and causes a higher risk of diabetes for the next generation. Equally important is that what can prevent diabetes.
Things Learned The lessons that we learned from this project are that healthy diet habits and working out can not be ignored in daily life. People should reduce sugar consumption and drink water as the primary beverage. It is necessary to lose weight if you are overweight, and quit smoking might also be something you should consider.
https://gis.cdc.gov/grasp/diabetes/DiabetesAtlas.html
https://www.statista.com/statistics/631235/number-of-mcdonald-s-us-by-state/
https://www.cdc.gov/alcohol/data-stats.htm
https://www.cdc.gov/mmwr/volumes/65/wr/mm6507a1.htm
https://apps.bea.gov/itable/iTable.cfm?ReqID=70&step=1#reqid=70&step=1&isuri=1